Voice Biometrics by García-Mateo Carmen;Chollet Gérard;

Voice Biometrics by García-Mateo Carmen;Chollet Gérard;

Author:García-Mateo, Carmen;Chollet, Gérard;
Language: eng
Format: epub
Publisher: Institution of Engineering & Technology
Published: 2021-07-30T16:00:00+00:00


5.5 Evaluation corpora

Two corpora were used to evaluate the performance of the speaker de-identification approaches described in Section 5.4. One of them includes many speakers with few training data, whereas the other includes few speakers with more training data. The purpose of using these two corpora is evaluating the performance of those speaker de-identification techniques according to the amount of training data, and also assessing the relevance of having more or less speakers available as source speakers for speaker-independent de-identification.

The first dataset is the Voice Cloning Toolkit (VCTK) Corpus [88]. VCTK Corpus was designed for speech synthesis applications, but its characteristics make it suitable for VC (and therefore de-identification) purposes. This corpus includes more than 100 speakers with various accents3 who recorded around 400 utterances each. These utterances include the Rainbow Passage [89], an elicitation paragraph, and sentences selected from a newspaper, the latter being different for each speaker. Since the techniques used in these experiments require a parallel corpus for training VC functions, the elicitation paragraph plus the Rainbow Passage were used for training, whereas the newspaper sentences were used for testing. It must be noted that not all the training sentences are available for all the speakers, so the amount of training utterances for each speaker differs slightly.

The second dataset used in these experiments is that of the Voice Conversion Challenge 2016 (VCC 2016) [90]. This corpus is based on the Data and Production Speech dataset [91], a freely available corpus recorded by professional US English speakers in a recording studio. Specifically, the “clean” version of the dataset was used in VCC 2016, and it includes around 13 min of speech, which were split into train and test sets, from each of the ten speakers that were selected for this corpus. The utterances include sentences from public domain books (novels such as Alice’s Adventures in Wonderland, Twenty Thousand Leagues Under the Seas, and Treasure Island, among others). All the speakers recorded the same sentences, so there is a parallel corpus for every pair of speakers.

Some statistics of VCTK and VCC 2016 corpora, as used in these experiments, are summarized in Table 5.2.

Table 5.2 Experimental framework

VCTK VCC 2016

# of speakers 109 10

Average # of training utterances 23 162

Average # of test utterances 383 54

Average duration training utterances (per speaker) 2 min 30 s 9 min 41 s

Average duration test utterances (per speaker) 21 min 44 s 2 min 57 s



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.